From 9ca81ed02a3aaa8429ea3ce5b4fef373f957aad4 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Tue, 30 May 2023 15:49:52 -0400 Subject: [PATCH] update --- doc/todo/speed_up_import_tree.mdwn | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/doc/todo/speed_up_import_tree.mdwn b/doc/todo/speed_up_import_tree.mdwn index 6f761a4905..1e1ca45624 100644 --- a/doc/todo/speed_up_import_tree.mdwn +++ b/doc/todo/speed_up_import_tree.mdwn @@ -10,7 +10,9 @@ own scalability limits with many files.) Still, it would be good to find some ways to speed it up. -Hmm... What if it generated a git tree, where each file in the tree is +--- + +What if it generated a git tree, where each file in the tree is a sha1 hash of the ContentIdentifier. The tree can just be recorded locally somewhere. It's ok if it gets garbage collected; it's only an optimisation. On the next sync, diff from the old to the new tree. It only needs to @@ -22,6 +24,11 @@ reasonable, because git loses data on sha1 collisions anyway, and ContentIdentif are no more likely to collide than the content of files, and probably less likely overall..) +How fast can a git tree of say, 10000 files be generated? Is it faster than +querying sqlite 10000 times? + +---- + Another idea would to be use something faster than sqlite to record the cid to key mappings. Looking up those mappings is the main thing that makes import slow when only a few files have changed and a large number have not. -- 2.30.2